Bringup Qwen2.5-1.5B by ChingTsai · Pull Request #3506 · AI-Hypercomputer/maxtext

ChingTsai · 2026-03-26T02:20:47Z

Description

Bringup Qwen2.5-1.5B

FIXES: b/495594907

Tests

Maxtext -> HF

python3 -m tests.utils.forward_pass_logit_checker src/maxtext/con
figs/base.yml run_name=forward_pass_test_scanned model_name=qwen2.5-1.5b tokenizer_path=Qwen/Qwen2.5-1.5B-Instruct load_parameters_path=XXXX max_prefill_predict_length=4 max_target_length=8 dataset_type=synthetic scan_layers=true per_device_batch_size=1 skip_jax_distributed_system=True dtype=float32 --max_kl_div=0.015  --run_hf_model=True 
--hf_model_path=Qwen/Qwen2.5-1.5B-Instruct

Scanned
Unscanned

HF -> Maxtext


python3 -m tests.utils.hf_checkpoint_conversion_checker --original_ckpt=hf_cache/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306 --converted_ckpt=qwen2.5-1.5b/hf_from_scanned

Log

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-03-26T02:25:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2026-03-26T02:38:10Z

🤖 Hi @ChingTsai, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-03-26T02:40:36Z

🤖 I'm sorry @ChingTsai, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-03-26T03:49:46Z

🤖 I'm sorry @ChingTsai, but I was unable to process your request. Please see the logs for more details.

RissyRan

Thanks!

shuningjin · 2026-03-29T19:03:19Z

I noticed that we now have three scripts to verify to_huggingface.
(1) tests.utils.forward_pass_logit_checker. This is what we initially have and commonly use.
(2) maxtext.checkpoint_conversion.compare_hf_ckpt. Introduced by PR 2903.
(3) tests.utils.hf_checkpoint_conversion_checker. Introduced by PR 3113.

@RissyRan: Could we have a unified test process for to_huggingface, as a follow-up?

For to_maxtext, we always use (1) as a test. convert HF1 -(to_maxtext)-> maxtext, compare HF1 and maxtext via logit check.
Similarly, for to_huggingface, we can also use (1). convert maxtext -(to_huggingface)-> HF2, compare HF2 and maxtext via logit check with --hf_model_path=$HF2.
Meanwhile, it seems (2) and (3) are performing the same task of comparing HF1 and HF2.
I would recommend using (1) for to_huggingface, to align with to_maxtext.

Example worflow

using qwen3-0.6b for demonstration

to_maxtext

# to_maxtext: HF1 -> maxtext
# HF1: Qwen/Qwen3-0.6B
# maxtext: gs://runner-maxtext-logs/to_maxtext_20260329/0/items

BASE_OUTPUT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329
python3 -m maxtext.checkpoint_conversion.to_maxtext \
src/maxtext/configs/base.yml model_name=qwen3-0.6b scan_layers=true \
base_output_directory=$BASE_OUTPUT_PATH hf_access_token=$HF_TOKEN \
hardware=cpu skip_jax_distributed_system=True \
attention=dot_product \
--eager_load_method=transformers --save_dtype=bfloat16

log: https://paste.googleplex.com/4580852571963392

# test to_maxtext: compare HF1 and maxtext

SCANNED_CKPT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329/0/items
python3 -m tests.utils.forward_pass_logit_checker src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=forward_logits_check load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=true attention=dot_product per_device_batch_size=1 model_name=qwen3-0.6b max_prefill_predict_length=4 max_target_length=4 async_checkpointing=false sparse_matmul=false ici_fsdp_parallelism=1 ici_expert_parallelism=1 checkpoint_storage_concurrent_gb=1024 weight_dtype=float32 dtype=float32 activations_in_float32=true matmul_precision=highest float32_logits=true float32_qk_product=true --max_kl_div=3e-4 \
hardware=cpu skip_jax_distributed_system=True \
--run_hf_model=true --hf_model_path=Qwen/Qwen3-0.6B tokenizer_path=Qwen/Qwen3-0.6B tokenizer_type=huggingface

log: https://paste.googleplex.com/5375600702390272

to_huggingface

# to_huggingface: maxtext -> HF2
# maxtext: gs://runner-maxtext-logs/to_maxtext_20260329/0/items
# HF2: /tmp/qwen3_20260329

SCANNED_CKPT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329/0/items
HF_PATH=/tmp/qwen3_hf_20260329 
python3 -m maxtext.checkpoint_conversion.to_huggingface \
src/maxtext/configs/base.yml \
model_name=qwen3-0.6b \
scan_layers=true load_parameters_path=$SCANNED_CKPT_PATH \
base_output_directory=$HF_PATH \
weight_dtype=bfloat16 \
skip_jax_distributed_system=true attention=dot_product

log: https://paste.googleplex.com/4844209799561216

# test to_huggingface: compare HF2 and maxtext

SCANNED_CKPT_PATH=gs://runner-maxtext-logs/to_maxtext_20260329/0/items
HF_PATH=/tmp/qwen3_hf_20260329 
python3 -m tests.utils.forward_pass_logit_checker src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=forward_logits_check load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=true attention=dot_product per_device_batch_size=1 model_name=qwen3-0.6b max_prefill_predict_length=4 max_target_length=4 async_checkpointing=false sparse_matmul=false ici_fsdp_parallelism=1 ici_expert_parallelism=1 checkpoint_storage_concurrent_gb=1024 weight_dtype=float32 dtype=float32 activations_in_float32=true matmul_precision=highest float32_logits=true float32_qk_product=true --max_kl_div=3e-4 \
hardware=cpu skip_jax_distributed_system=True \
--run_hf_model=true --hf_model_path=$HF_PATH tokenizer_path=Qwen/Qwen3-0.6B tokenizer_type=huggingface

log: https://paste.googleplex.com/5559004127428608

ChingTsai force-pushed the jimmytsai/bring-up-qwen2_5-1_5b branch from b4688d0 to e35f17f Compare March 26, 2026 02:30

ChingTsai changed the title ~~bringup qwen2.5-1.5B~~ Bringup qwen2.5-1.5B Mar 26, 2026

ChingTsai added the gemini-review label Mar 26, 2026

ChingTsai marked this pull request as ready for review March 26, 2026 03:33

ChingTsai changed the title ~~Bringup qwen2.5-1.5B~~ Bringup Qwen2.5-1.5B Mar 26, 2026

RissyRan approved these changes Mar 26, 2026

View reviewed changes

bringup qwen2.5-1.5B

387df2d

ChingTsai force-pushed the jimmytsai/bring-up-qwen2_5-1_5b branch from e35f17f to 387df2d Compare March 27, 2026 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bringup Qwen2.5-1.5B#3506

Bringup Qwen2.5-1.5B#3506
ChingTsai wants to merge 1 commit intomainfrom
jimmytsai/bring-up-qwen2_5-1_5b

ChingTsai commented Mar 26, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

RissyRan left a comment

Uh oh!

shuningjin commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChingTsai commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Maxtext -> HF

HF -> Maxtext

Checklist

Uh oh!

codecov bot commented Mar 26, 2026

Codecov Report

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

shuningjin commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChingTsai commented Mar 26, 2026 •

edited

Loading

shuningjin commented Mar 29, 2026 •

edited

Loading